Skip to content

Add pipeline queue monitoring with Prometheus metrics#497

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-471
Draft

Add pipeline queue monitoring with Prometheus metrics#497
Copilot wants to merge 3 commits intomainfrom
copilot/fix-471

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Sep 23, 2025

This PR implements comprehensive queue monitoring for Zeno's processing pipeline, allowing operators to identify bottlenecks by tracking the queue sizes of each component through Prometheus metrics.

Overview

The implementation adds real-time monitoring of all pipeline channels with both Prometheus and TUI integration. This enables operators to see which components are slowing down the crawl by having the largest queues.

Key Features

Pipeline Component Tracking

All 8 pipeline components are now monitored with descriptive names:

  • reactor_to_preprocessor - Items flowing from reactor to preprocessor
  • preprocessor_to_archiver - Items flowing from preprocessor to archiver
  • archiver_to_postprocessor - Items flowing from archiver to postprocessor
  • postprocessor_to_finisher - Items flowing from postprocessor to finisher
  • finisher_to_source - Finished items returned to source (HQ/LQ)
  • source_to_finisher - New items from source to finisher
  • hq_outlinks_finish - HQ outlinks finish channel
  • hq_outlinks_produce - HQ outlinks produce channel

Prometheus Metrics Integration

Component queue sizes are exposed as Prometheus metrics with the following format:

# HELP zeno_component_queue_size Size of component queues in the processing pipeline
# TYPE zeno_component_queue_size gauge
zeno_component_queue_size{component="reactor_to_preprocessor",hostname="host",project="proj",version="ver"} 0
zeno_component_queue_size{component="preprocessor_to_archiver",hostname="host",project="proj",version="ver"} 2
zeno_component_queue_size{component="archiver_to_postprocessor",hostname="host",project="proj",version="ver"} 1
...

TUI Integration

Queue sizes are now displayed in the terminal interface as part of the stats panel, updating in real-time every 500ms alongside existing metrics.

Usage Examples

# Start crawl with Prometheus metrics
./Zeno get url --api --prometheus --api-port 9000 https://example.com

# Query component queue sizes
curl http://localhost:9000/metrics | grep component_queue

# Use with Prometheus/Grafana for monitoring and alerting
# Use TUI for live monitoring
./Zeno get url --tui https://example.com

Technical Implementation

  • Thread-safe: Uses RWMutex for concurrent access to channel tracking
  • Zero overhead: Minimal performance impact when not actively monitoring
  • Dependency injection: Testable design with proper separation of concerns
  • Real-time: Queue sizes calculated using len(channel) for instant feedback
  • Prometheus integration: Updates every 250ms alongside other stats

Benefits

  1. Bottleneck Identification: Quickly identify which components are slower than others
  2. Performance Optimization: Data-driven capacity planning and tuning
  3. Monitoring Integration: Native Prometheus metrics enable integration with standard monitoring tools
  4. Historical Analysis: Time-series data for trend analysis and alerting

This implementation provides pipeline visibility for operators to identify and resolve performance bottlenecks effectively using standard monitoring infrastructure.

Fixes #471.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 23, 2025

Codecov Report

❌ Patch coverage is 72.72727% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.42%. Comparing base (1f0d58f) to head (ca40c1f).
⚠️ Report is 50 commits behind head on main.

Files with missing lines Patch % Lines
internal/pkg/stats/methods.go 16.66% 4 Missing and 1 partial ⚠️
internal/pkg/stats/prometheus.go 0.00% 5 Missing ⚠️
internal/pkg/stats/stats.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #497      +/-   ##
==========================================
+ Coverage   56.39%   56.42%   +0.03%     
==========================================
  Files         130      131       +1     
  Lines        8120     8167      +47     
==========================================
+ Hits         4579     4608      +29     
- Misses       3172     3188      +16     
- Partials      369      371       +2     
Flag Coverage Δ
e2etests 40.64% <47.27%> (-0.01%) ⬇️
unittests 29.52% <49.09%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>
Copilot AI changed the title [WIP] Which components are the slowest? Add pipeline queue monitoring with warcprox-compatible API endpoint Sep 23, 2025
Copilot AI requested a review from CorentinB September 23, 2025 08:42
@CorentinB
Copy link
Copy Markdown
Collaborator

@copilot it wasn't asked to do a warcprox-compliant endpoint. Add stuff like component_queues informations in the Prometheus endpoint instead.

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Sep 23, 2025

@copilot it wasn't asked to do a warcprox-compliant endpoint. Add stuff like component_queues informations in the Prometheus endpoint instead.

Implemented as requested. Removed the warcprox-compliant /status endpoint and added component queue information to the Prometheus metrics instead.

The component queue sizes are now available as zeno_component_queue_size gauge metrics with component labels, updating every 250ms alongside other stats. This provides better integration with standard monitoring infrastructure.

Usage: curl http://localhost:9000/metrics | grep component_queue

Commit: ca40c1f

Copilot AI changed the title Add pipeline queue monitoring with warcprox-compatible API endpoint Add pipeline queue monitoring with Prometheus metrics Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Which components are the slowest?

3 participants